Database outlines of FLJ human cDNA
database
FLJ human cDNA
database was constructed as human cDNA sequence analysis database focused on
mRNA varieties caused by variations of transcription start site
(TSS) and splicing.
Human gene number was estimated to be 20-25
thousand. However number of human mRNA varieties was predicted to be about 100
thousand. The varieties are thought to be caused by variations of TSS and
splicing. In our previous human cDNA project, about 30 thousand of FLJ human
full-length sequenced cDNAs were deposited to DDBJ/GenBank/EMBL, and we
obtained about 1.4 million of 5'-end sequences (5'-EST) of FLJ full-length
cDNAs from about 100 kinds of cDNA libraries consist of human tissues and cells
constructed by oligo-capping method. The majority of the insert cDNA sizes were
over 2 kb and the full-length rate of 5'-end was 90%. And our FLJ cDNAs were
covered about 80% of human genes. In these situations we developed efficient human
splicing variant cDNA cloning and evaluation systems in our project. About 22 thousand of finished grades of full-length
sequenced cDNAs were obtained in this project.
Then
we constructed the
sequence analysis databases focused on mRNA variations using human genome and
cDNA sequences, FLJ full-length sequenced cDNAs, 5f-ESTs of FLJ full-length
cDNAs and other cDNA sequences described below. After those sequences were mapped onto the human genome sequences, clustering of the
cDNA sequences were done based on the mapping results. Functional annotations described below were
done. Annotations described below are searched and viewed in this database.
Annotations :
a) cDNA information
-
Annotation A1: genome locus information of cDNA sequences
-
Annotation A2: functional annotations of cDNA and the translated amino acid sequences
such as BLAST analysis results, Pfam, PROSITE, PSORT, SignalP, SOSUI and GO
(Gene Ontology)
b)
cDNA cluster information
-
Annotation A3: genome position and locus information of cDNA clusters
- mRNA variation viewer*
*
including expression profile by 5f-ESTs of high full-length rate FLJ cDNAs.
Data
set :
1) Human cDNA sequences of cDNAs
by oligo-capping method used in this DB
EFull-length cDNA sequence
data : 52,120 (finished grade)
@@@a) FLJ
full-length cDNA sequence data by FLJ-PJ : 30,326
b) FLJ full-length cDNA sequence data by human cDNA
sequencing project focused on splicing variants of mRNAs in NEDO FAP-PJ :
21,794 (finished
grade) + 3,282 (draft grade)
EFLJ ESTs : 5'-EST 1,456,213 & 3'-EST 109,283
2) Others human cDNA sequences from public DB
in this DB
EFull-length sequenced cDNAs (KIAA, MGC, DKFZ
etc.) : 52,126
ERefSeq
(human) and Ensembl (human gene transcripts) : 77,346
EUniGene, human ESTs : 5'-EST 2,699,311* & 3'-EST 1,638,884*
* about 1.6
million of FLJ EST sequences deposited by us are excluded
3) Human genome sequences
@EUCSC hg18 (NCBI Build
36.1)